Joining Extractions of Regular Expressions
نویسندگان
چکیده
Regular expressions with capture variables, also knownas “regex formulas,” extract relations of spans (inter-val positions) from text. These relations can be fur-ther manipulated via Relational Algebra as studied inthe context of document spanners, Fagin et al.’s for-mal framework for information extraction. We investigate the complexity of querying text by ConjunctiveQueries (CQs) and Unions of CQs (UCQs) on top ofregex formulas. We show that the lower bounds (NP-completeness and W[1]-hardness) from the relationalworld also hold in our setting; in particular, hardnesshits already single-character text! Yet, the upper boundsfrom the relational world do not carry over. Unlike therelational world, acyclic CQs, and even gamma-acyclicCQs, are hard to compute. The source of hardness isthat it may be intractable to instantiate the relationdefined by a regex formula, simply because it has anexponential number of tuples. Yet, we are able to es-tablish general upper bounds. In particular, UCQs canbe evaluated with polynomial delay, provided that everyCQ has a bounded number of atoms (while unions andprojection can be arbitrary). Furthermore, UCQ evalu-ation is solvable with FPT (Fixed-Parameter Tractable)delay when the parameter is the size of the UCQ.
منابع مشابه
Discrete Time Analysis of Multi-Server Queueing System with Multiple Working Vacations and Reneging of Customers
This paper analyzes a discrete-time $Geo/Geo/c$ queueing system with multiple working vacations and reneging in which customers arrive according to a geometric process. As soon as the system gets empty, the servers go to a working vacations all together. The service times during regular busy period, working vacation period and vacation times are assumed to be geometrically distributed. Customer...
متن کاملTransformation Between Regular Expressions and ω-Automata
We propose a new definition of regular expressions for describing languages of ω-words, called∞regular expressions. These expressions are obtained by adding to the standard regular expression on finite words an operator ∞ that acts similar to the Kleene-star but can be iterated finitely or infinitely often (as opposed to the ω-operator from standard ω-regular expressions, which has to be iterat...
متن کاملDerivatives for Enhanced Regular Expressions
Regular languages are closed under a wealth of formal language operators. Incorporating such operators in regular expressions leads to concise language specifications, but the transformation of such enhanced regular expressions to finite automata becomes more involved. We present an approach that enables the direct construction of finite automata from regular expressions enhanced with further o...
متن کاملObtaining shorter regular expressions from finite-state automata
We consider the use of state elimination to construct shorter regular expressions from finite-state automata (FAs). Although state elimination is an intuitive method for computing regular expressions from FAs, the resulting regular expressions are often very long and complicated. We examine the minimization of FAs to obtain shorter expressions first. Then, we introduce vertical chopping based o...
متن کاملShorter Regular Expressions from Finite-State Automata
We consider the use of state elimination to construct shorter regular expressions from finite-state automata. Although state elimination is an intuitive method for computing regular expressions from finitestate automata, the resulting regular expressions are often very long and complicated. We examine the minimization of finite-state automata to obtain shorter expressions first. Then, we introd...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1703.10350 شماره
صفحات -
تاریخ انتشار 2017